Goto

Collaborating Authors

 Zulia State


A Psychology-based Unified Dynamic Framework for Curriculum Learning

Meng, Guangyu, Zeng, Qingkai, Lalor, John P., Yu, Hong

arXiv.org Artificial Intelligence

Directly learning from examples of random difficulty levels is often challenging for both humans and machine learning models. A more effective strategy involves exposing learners to examples in a progressive order, from easy to difficult. Curriculum Learning (CL) has been proposed to implement this strategy in machine learning model training. However, two key challenges persist in CL framework design: defining the difficulty of training data and determining the appropriate amount of data to input at each training step. This paper presents a Psychology-based Unified Dynamic Framework for Curriculum Learning (PUDF), drawing inspiration from psychometrics. We quantify the difficulty of training data by applying Item Response Theory (IRT) to responses from Artificial Crowds (AC). This theory-driven IRT-AC approach leads to global (i.e., model-independent) and interpretable difficulty values. Leveraging IRT, we propose a Dynamic Data Selection via Model Ability Estimation (DDS-MAE) strategy to schedule the appropriate amount of data during model training. Since our difficulty labeling and model ability estimation are based on a consistent theory, namely IRT, their values are comparable within the same scope, potentially leading to a faster convergence compared to the other CL methods. Experimental results demonstrate that fine-tuning pre-trained language models with PUDF enhances their performance on the GLUE benchmark. Moreover, PUDF surpasses other state-of-the-art (SOTA) CL methods on the GLUE benchmark. We further explore the components of PUDF, namely the difficulty measurer (IRT-AC) and the training scheduler (DDS-MAE) qualitatively and quantitatively. Lastly, we conduct an ablation study to clarify which components of PUDF contribute to faster convergence and higher accuracy.


VERISCORE: Evaluating the factuality of verifiable claims in long-form text generation

Song, Yixiao, Kim, Yekyung, Iyyer, Mohit

arXiv.org Artificial Intelligence

Existing metrics for evaluating the factuality of long-form text, such as FACTSCORE (Min et al., 2023) and SAFE (Wei et al., 2024), decompose an input text into "atomic claims" and verify each against a knowledge base like Wikipedia. These metrics are not suitable for most generation tasks because they assume that every claim is verifiable (i.e., can plausibly be proven true or false). We address this issue with VERISCORE, a metric for diverse long-form generation tasks that contain both verifiable and unverifiable content. VERISCORE can be effectively implemented with either closed or fine-tuned open-weight language models, and human evaluation confirms that VERISCORE's extracted claims are more sensible than those from competing methods across eight different long-form tasks. We use VERISCORE to evaluate generations from 16 different models across multiple long-form tasks and find that while GPT-4o is the best-performing model overall, open-weight models such as Mixtral-8x22 are closing the gap. We show that an LM's VERISCORE on one task (e.g., biography generation) does not necessarily correlate to its VERISCORE on a different task (e.g., long-form QA), highlighting the need for expanding factuality evaluation across tasks with varying fact density.


Reconstructing Historical Climate Fields With Deep Learning

Bochow, Nils, Poltronieri, Anna, Rypdal, Martin, Boers, Niklas

arXiv.org Artificial Intelligence

Historical records of climate fields are often sparse due to missing measurements, especially before the introduction of large-scale satellite missions. Several statistical and model-based methods have been introduced to fill gaps and reconstruct historical records. Here, we employ a recently introduced deep-learning approach based on Fourier convolutions, trained on numerical climate model output, to reconstruct historical climate fields. Using this approach we are able to realistically reconstruct large and irregular areas of missing data, as well as reconstruct known historical events such as strong El Ni\~no and La Ni\~na with very little given information. Our method outperforms the widely used statistical kriging method as well as other recent machine learning approaches. The model generalizes to higher resolutions than the ones it was trained on and can be used on a variety of climate fields. Moreover, it allows inpainting of masks never seen before during the model training.


RobuT: A Systematic Study of Table QA Robustness Against Human-Annotated Adversarial Perturbations

Zhao, Yilun, Zhao, Chen, Nan, Linyong, Qi, Zhenting, Zhang, Wenlin, Tang, Xiangru, Mi, Boyu, Radev, Dragomir

arXiv.org Artificial Intelligence

Despite significant progress having been made in question answering on tabular data (Table QA), it's unclear whether, and to what extent existing Table QA models are robust to task-specific perturbations, e.g., replacing key question entities or shuffling table columns. To systematically study the robustness of Table QA models, we propose a benchmark called RobuT, which builds upon existing Table QA datasets (WTQ, WikiSQL-Weak, and SQA) and includes human-annotated adversarial perturbations in terms of table header, table content, and question. Our results indicate that both state-of-the-art Table QA models and large language models (e.g., GPT-3) with few-shot learning falter in these adversarial sets. We propose to address this problem by using large language models to generate adversarial examples to enhance training, which significantly improves the robustness of Table QA models. Our data and code is publicly available at https://github.com/yilunzhao/RobuT.


Regionalized models for Spanish language variations based on Twitter

Tellez, Eric S., Moctezuma, Daniela, Miranda, Sabino, Graff, Mario, Ruiz, Guillermo

arXiv.org Artificial Intelligence

Spanish is one of the most spoken languages in the globe, but not necessarily Spanish is written and spoken in the same way in different countries. Understanding local language variations can help to improve model performances on regional tasks, both understanding local structures and also improving the message's content. For instance, think about a machine learning engineer who automatizes some language classification task on a particular region or a social scientist trying to understand a regional event with echoes on social media; both can take advantage of dialect-based language models to understand what is happening with more contextual information hence more precision. This manuscript presents and describes a set of regionalized resources for the Spanish language built on four-year Twitter public messages geotagged in 26 Spanish-speaking countries. We introduce word embeddings based on FastText, language models based on BERT, and per-region sample corpora. We also provide a broad comparison among regions covering lexical and semantical similarities; as well as examples of using regional resources on message classification tasks.


The Venezuelans Trying to Escape Their Country Through Video Game Grunt Work

Slate

On a recent afternoon in Maracaibo, Venezuela, Alexander Marinez, who has short-cropped black hair and three-to-four-day stubble, sat in front of his computer tracking herbiboars in the mushroom forests on Fossil Island. He pressed down on his glowing mouse, the newest addition to his otherwise timeworn gaming setup. The pixelated character on his computer screen followed the tracks of a hedgehoglike creature with triangular tusks and herbs growing out of its back. Outside Marinez's one-story house, the sun bore down on the dirt road. His home lies about six miles away from the strait that connects the Caribbean Sea with Lake Maracaibo, one of the world's richest sources of oil. The character inspected a tunnel. Suddenly, the herbiboar appeared, and the character attacked, stunning it.


This is Artificial Intelligence's dirty little secret Gadgets Now

#artificialintelligence

SAN FRANCISCO: There's a dirty little secret about artificial intelligence: It's powered by hundreds of thousands of real people. From makeup artists in Venezuela to women in conservative parts of India, people around the world are doing the digital equivalent of needlework _drawing boxes around cars in street photos, tagging images, and transcribing snatches of speech that computers can't quite make out. Such data feeds directly into machine learning'' algorithms that help self-driving cars wind through traffic and let Alexa figure out that you want the lights on. These repetitive tasks pay pennies apiece. But in bulk, this work can offer a decent wage in many parts of the world _ even in the U.S.


Artificial intelligence has a dirty little secret: It's powered by people

#artificialintelligence

There's a dirty little secret about artificial intelligence: It's powered by an army of real people. From makeup artists in Venezuela to women in conservative parts of India, people around the world are doing the digital equivalent of needlework -- drawing boxes around cars in street photos, tagging images, and transcribing snatches of speech that computers can't quite make out. Such data feeds directly into "machine learning" algorithms that help self-driving cars wind through traffic and let Alexa figure out that you want the lights on. These repetitive tasks pay pennies apiece. But in bulk, this work can offer a decent wage in many parts of the world -- even in the U.S. And it underpins a technology that could change humanity forever: AI that will drive us around, execute verbal commands without flaw, and -- possibly -- one day think on its own.


Real people do much of 'artificial intelligence' work

@machinelearnbot

There's a dirty little secret about artificial intelligence: It's powered by an army of real people. From makeup artists in Venezuela to women in conservative parts of India, people around the world are doing the digital equivalent of needlework -- drawing boxes around cars in street photos, tagging images, and transcribing snatches of speech that computers can't quite make out. Such data feeds directly into "machine learning" algorithms that help self-driving cars wind through traffic and let Alexa figure out that you want the lights on. These repetitive tasks pay pennies apiece. But in bulk, this work can offer a decent wage in many parts of the world -- even in the U.S.


AI's dirty little secret

#artificialintelligence

San Francisco - There's a dirty little secret about artificial intelligence: It's powered by an army of real people. From makeup artists in Venezuela to women in conservative parts of India, people around the world are doing the digital equivalent of needlework - drawing boxes around cars in street photos, tagging images, and transcribing snatches of speech that computers can't quite make out. Such data feeds directly into "machine learning" algorithms that help self-driving cars wind through traffic and let Alexa figure out that you want the lights on. These repetitive tasks pay pennies apiece. But in bulk, this work can offer a decent wage in many parts of the world - even in the US.